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Why symbolic execution? 


Inthe old days 


e Static analysis 
e Dynamic analysis 


Static analysis 


BEI var || @ | $ sa [DIG (ES gm ut ux ux |» 0 Ø [Local win32 debugger zl: »| 


* objdump 
* IDA PRO 


[F] Functions window ` & x IDA View-A D | a Imports | 
Function name ^ 
sub 401000 

sub 401050 

sub 4010A0 

sub 401110 

sub 401200 

sub 401390 

sub 401430 

sub 401480 

sub 401770 

sub 401A20 

sub 401BE0 

sub 401BF0 

sub 401C10 
WSACleanup 

sub 401C60 

sub 401CD0 
WSAGetLastError 
sub 401D00 

sub 401E00 

sub 401E80 

sub 401E90 

sub 401FCO 

sub 402040 

sub 402090 

sub 4020A0 

sub 4020D0 x| 


bé va 53 


StartAddress 


sub_487DD® 


sub 4570D0 


sub 464A14 
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[100.003 [(-18,-11) [(198,606) [000075C0 [004081CO: sub 4081C0 


Dynamic analysis 


e GDB 
e Itrace 


* strace 


apple - test fixtures files Itrace 
| libc start main(0x80486c9, 1, Oxffe9ddb4, 0x80487a0 «unfinished 
puts ("Welcome to Magic system!"Welcome to Magic system! 
y 
printf("Give me your name(a-z): ") 
fflush(Oxf76b9d60Give me your name(a-z) 
read( 
jer: 3 
read(0, 
read(0, 
read(0, 
read(0, 
read(0, 
printf("Y 
) 
printf("Give me something that you want 
fflush(0xf76b9d60Give me something that you want to MAGIC: ) 
__isoc99_scanf(0x8048836, Oxffe9dca4, 42, 0xf76b7960| 


"apple"Your name is apple. 


" 


./m 


Win\n", 


Free Software Foundation, Inc. 
GNU GPL sion 3 or later <http://gnu.org/license pl.html> 
you are free to change and redistribute it. 
Type "show copying" 
iarranty" for details. 
GDB was configured as _64-pc-linux-gnu". 
"show configuration" for configuration detai 
- bug reporting instructions, please see: 
w.gnu.org/software/gdb/bugs/>. 
Find the GDB manual and other documentation resources online at: 
<http .gnu.org/software/gdb/documentatio 
For help, type "help". 
Type "apropos word" to search for commands related to "word"... 
Reading symbols from crackme_hash_32...(no debugging symbols found)... 
(gdb) break main 
Breakpoint 1 at 0x8048490 
(gdb) | 


«9a34000 
ap", F_OK) 1 ENOENT (No s 
OT READ |PROT WRITE, MAP. PRIVATE |MAP ANONYM 5 
d", RÅ 1 ENOENT (No such file 


203\1\0004\0\0\0"..., 512) = 512 
IAP_DENYWRITE, 3, 0) = 0xf75b7000 
PROT_NONE) 0 
PROT_READ|PROT_WRITE, MAP_PRIVATE |MAP_FIXED|MAP_DENYWRITE, 
, PROT_READ|PROT_WRITE, MAP. PRIVATE |MAP_FIXED |MAP_ANONYMO 
) 


TE |MAP_ANONYMOL 
5b5700, limit: 


4win 


My brain is going to explode 


Symbolic execution!!! 
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What is symbolic execution? 


Symbolic execution 


e Symbolic execution is a means of analyzing a program to determine 
what inputs cause each part of a program to execute. 


e System-level 

* S2e(https://github.com/dslab-epfl/s2e) 
* User-level 

* Angr(http://angr.io/) 

* Triton(https://triton.quarkslab.com/) 
* Code-based 

e klee(http://klee.github.io/) 


} else { 


Symbolic execution 


Triton 


* Website: https://triton.quarkslab.com/ 


e A dynamic binary analysis framework written in C++. 
* developed by Jonathan Salwan 

e Python bindings 

* Triton components: 
e Symbolic execution engine 
* [racer 


* AST representations 
* SMT solver Interface 


Triton 


e Structure 

e Symbolic execution engine 

° Triton Tracer 

* AST representations 

* Static single assignment form(SSA form) 
e Symbolic variables 

* SMT solver Interface 

* Example 


Structure 


LibTriton.so 


Symbolic execution engine 


e The symbolic engine maintains: 
* atable of symbolic registers states 
e a map of symbolic memory states 
* a global set of all symbolic references 


Register |Instruction|Set of symbolic expressions 


ii ed UNSET None 


LIN mu 
2 lem=92 Ince |$1=0,42=$1+1) 
B femx=93 hdd eax 5 |1-0,02-0141,03-92«5] 


Triton Tracer 


* Tracer provides: 
* Current opcode executed 
* State context (register and memory) 


* Translate the control flow into AST Representations 
* Pin tracer support 


AST representations 


e Triton converts the x86 and the x86-64 instruction set semantics into 
AST representations. 


* Triton's expressions are on SSA form. 
* Instruction: add rax, rdx 


* Expression: ref!41 = (bvadd ((_ extract 63 0) ref!40) ((_ extract 63 O) 
ref!39)) 


* ref!41 is the new expression of the RAX register. 
* ref!40 is the previous expression of the RAX register. 
* ref!39 is the previous expression of the RDX register. 


* moval, 1 
* mov cl, 10 
* mov dl, 20 
* xor cl, dl 
* add al, cl 


AST representations 


Static single assignment form(SSA form) 


Each variable is assigned exactly once 
ey:=1 

ey:=2 

*X:=y 

Turns into 

e y1:=1 

e y2 :=2 

e x1 := y2 


Static single assignment form(SSA form) 


y1——1- (This assignment is not necessary) 
y2 := 2 
x1 := y2 


e When Triton process instructions, it could ignore some unnecessary 
instructions. 


Symbolic variables 


* Imagine symbolic as a infection. If one of the operand of a instruction 
is symbolic, the register or memory which the instruction infect will 
be symbolic. 


* In Triton, we could use the following method to manipulate it. 
* convertRegisterToSymbolicVariable(const triton::arch:: Register &reg) 
* isRegisterSymbolized(const triton::arch::Register &reg) 


Symbolic variables 


1. Make ecx as symbolic variable + convertRegisterToSymbolicVaria 
ble(Triton.registers.rcx) 


* isRegisterSymbolized(Triton.regi 
sters.rcx) == True 


Symbolic variables 


1. Make ecx as symbolic variable * ZF = AND(ecx, ecx) == 


2. testecx, ecx * |f ecx == Q: 
e Set ZF to 1 


e Else: 
e Set ZF to O 


JE ll YS n 


Symbolic variables 


Make ecx as symbolic variable — * If ZF == 1: 


test ecx, ecx * Jump to nop 


e Else: 
e Execute next instruction 


* isRegisterSymbolized(Triton.regi 
WER sters.eip) == True 


je +7 (eip) 
mov edx, Ox64 


SMT solver Interface 


Example 


e Defcamp 2015 r100 

e Program require to input the password 

e Password length could up to 255 characters 

e Then do the serial operations to check password is correct 


Defcamp 2015 r100 


Bos Ceum 2-2. 30. 
int — cdecl main(int argc, const char **argu, const char **enup) 


í 


int result; // eax@3 

. intóh uh; // rcx@6 

char s; ZZ [sp+6h] [bp-118h]@1 

. intóh v6; // [sp*188h] [bp-8h]&1 

vö = *MK FP( FS , HØLL); 

printf("Enter the password: ", argu, enup); 
if ( fgets(&s, 255, stdin) ) 


if ( (unsigned int)sub AØ96FD(( intö4)&Ss) ) 


i 
puts("Incorrect passuvord*"); 
result = 1; 

} 

else 

i 
puts("Nicet"); 
result = 8; 

H 

} 
else 
< 
result = 6; 


H 
u4 = «HK FP( FS HØLL) ^ vő; 
return result; 


Defcamp 2015 r100 


signed int64 _fastcall sub ^886FD(char sai? 
í 


signed int i; // [sp+14h] [bp-24h]@1 
char v3[8]; // [sp*18h] [bp-26h]@1 
char v4[8]; ZZ [sp*28h] [bp-18h]@1 
char v5[8]; ZZ [sp*28h] [bp-16h]@1 


x( QWORD *)U3 = "Dufhbnf"; 
x( QWORD x)uh = "pG `imos"; 
x( QUORD *)U5 = "eulglpt"; 


for ( 1 = 8; 1 <= 11; ++1 ) 
d 
if ( *( BYTE *)(«( QUORD x)&u3[8 * (i $ 3)] + 2 * (i Z 3)) - a1[i] f» 1) 
return 1LL; 


return ØLL; 


Defcamp 2015 r100 


° Import Triton and initialize Triton context 
* Set Architecture 

* Load segments into triton 

* Define fake stack ( RBP and RSP ) 

e Symbolize user input 

* Start to processing opcodes 

* Set constraint on specific point of program 
e Get symbolic expression and solve it 


Import Triton and initialize Triton context 


from triton import ARCH, TritonContext, Instruction, MODE, MemoryAccess, CPUSIZE 


Triton = TritonContext() 


Get Architecture 


setArchitecture(ARCH.X86 64) 


Load segments into triton 


def loadBinary(path): 
import lief 
binary = lief.parse(path) 
phdrs = binary.segments 


for phdr in phdrs: 


size phdr.physical size 

vaddr = phdr.virtual address 

print '[+] Loading Øx%Ø6x - @x%@6x' %(vaddr, vaddr+size) 
Triton.setConcreteMemoryAreaValue(vaddr, phdr.content) 


return 


Define fake stack ( RBP and RSP ) 


# Define a fake stack 
Triton.setConcreteRegisterValue(Triton.registers.rbp, Ox7fffffff) 
Triton.setConcreteRegisterValue(Triton.registers.rsp, Ox6fffffff) 


Symbolize user input 


# Define an user input 


Triton.setConcreteRegisterValue(Triton.registers.rdi, 0x10000000) 


# Symbolize user inputs (30 bytes) 
for index in range(30): 


Triton.convertMemoryToSymbolicVariable(MemoryAccess(0x10000000-index, CPUSIZE.BYTE)) 


Start to processing opcodes 


emulate(0x4006FD) 
def emulate(pc): 
while pc: 
# Fetch opcode 


opcode = Triton.getConcreteMemoryAreaValue(pc, 16) 


# Create the Triton instruction 
instruction = Instruction() 
instruction. setOpcode (opcode) 


instruction.setAddress(pc) 


# Process 


Triton. processing(instruction) 


# Next 


pc = Triton.getConcreteRegisterValue(Triton.registers.rip) 


Get symbolic expression and solve it 


i 40078B: cmp eax, 1 
2 # eax must be equal to 1 at each round. 
if instruction.getAddress() == 0x40078B: 
# Slice expressions 
rax = Triton.getSymbolicExpressionFromId(Triton.getSymbolicRegisterId(Triton.registers.rax)) 
eax - astCtxt.extract(31, 0, rax.getAst()) 


# Define constraint 
cstr - astCtxt.land([ 
Triton.getPathConstraintsAst(), 
astCtxt.equal(eax, astCtxt.bv(1, 32)) 
]) 


print '[+] Asking for a model, please wait...' 

model - Triton.getModel(cstr) 

for k, v in model.items(): 
value - v.getValue() 
Triton.setConcreteSymbolicVariableValue(Triton.getSymbolicVariableFromId(k), value) 
print '[+] Symbolic variable %02d = %02x (%c)' %(k, value, chr(value)) 


Some problems of Triton 


° The whole procedure is too complicated. 
* High learning cost to use Triton. 
e With support of debugger, many steps could be simplified. 


SymGDB 


* Repo: 
https://github.com/SQLab/symgdb 


e Symbolic execution support for GDB 


e Combined with: 
e GDB Python API 
° Triton 

e Symbolic environment 
* symbolize argv 


LJ SQLab / symgdb 
<> Code Issues 0 Pull requests 0 


SymGDB - symbolic execution plugin for gdb 


gdb gdb-plugin symbolic-execution triton 


(p 48 commits P 1 branch 
um ———— ———————Un—Ó]H 


Branch: master v 


A bananaappletw Support triton 0.6 


Design and Implementation 


e GDB Python API e Supported Commands 


* Failed method e Symbolic Execution Process in 
GDB 


e Symbolic Environment 


e Successful method 


e Flow 
* symbolic argv 
e SymGDB System Structure l 
e Debug tips 
e implementation of System 
e Demo 


Internals 


e Relationship between SymGDB 
classes 


GDB Python API 


e API: https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html 
e Source python script in .gdbinit 


* Functionalities: 
* Register GDB command 
* Register event handler (ex: breakpoint) 
e Execute GDB command and get output 
* Read, write, search memory 


Register GDB command 


class Triton(gdb.Command): 
def _ init (self): 
super(Triton, self). init ("triton", gdb.COMMAND DATA) 


def invoke(self, arg, from tty): 
Symbolic().run() 
Triton() 


Register event handler 


1 def breakpoint handler(event ): 
GdbUtil().reset() 
Arch().reset() 


gdb.events.stop.connect(breakpoint handler) 


Execute GDB command and get output 


def get stack start address(self): 
out - gdb.execute("info proc all", to string-True) 
line - out.splitlines()[-1] 
pattern = re.compile("(Øx[Ø-9a-f]*)") 
matches - pattern.findall(line) 
return int(matches[0], ®) 


Read memory 


def get memory(self, address, size): 


Get memory content from gdb 


Args: 
- address: start address of memory 
- size: address length 

Returns: 


- list of memory content 


return map(ord, list(gdb.selected inferior().read memory(address, size))) 


Write memory 


def inject_to_gdb(self): 
for address, size in self.symbolized_memory: 
self.log("Memory updated: %s-%s" % (hex(address), hex(address + size))) 
for index in range(size): 
memory = chr(TritonContext.getSymbolicMemoryValue(MemoryAccess(address + index, CPUSIZE.BYTE))) 
gdb.selected inferior().write memory(address + index, memory, CPUSIZE.BYTE) 


Failed method 


* At first, | try to use Triton callback to get memory and register values 


* Register callbacks: 
* needConcreteMemoryValue(const triton::arch::MemoryAccess& mem) 
* needConcreteRegisterValue(const triton::arch::Register& reg) 


* Process the following sequence of code 
* mov eax, 5 
* mov ebx,eax (Trigger needConcreteRegisterValue) 


e We need to set Triton context of eax 


def 


def 


Triton callbacks 


needConcreteMemoryValue(TritonContext, mem): 

mem addr = mem.getAddress() 

mem size = mem.getSize() 

mem val = TritonContext.getConcreteMemoryValue(MemoryAccess(mem addr,mem size) ) 


TritonContext.setConcreteMemoryValue (MemoryAccess(mem addr,mem size, mem val)) 


needConcreteRegisterValue(TritonContext, reg): 
reg name = reg.getName() 
reg val - TritonContext.getConcreteRegisterValue(getattr(TritonContext.registers, reg name)) 


setConcreteRegisterValue(getattr(TritonContext.registers, reg name), reg val)) 


TritonContext.addCallback(needConcreteMemoryValue, CALLBACK.GET CONCRETE MEMORY VALUE) 
TritonContext.addCallback(needConcreteRegisterValue, CALLBACK.GET CONCRETE REGISTER VALUE) 


Problems 


* Values from GDB are out of date 

* Consider the following sequence of code 

mov eax, 5 

* We set breakpoint here, and call Triton's processing() 
mov ebx,eax (trigger callback to get eax value, eax - 5) 
mov eax, 10 

mov ecx, eax (Trigger again, get eax - 5) 

* Because context state not up to date 


Tried solutions 


* Before needed value derived from GDB, check if it is not in the 
Triton's context yet 


Not working! 
Triton will fall into infinite loop 


Successful method 


e Copy GDB context into Triton 

* Load all the segments into Triton context 

e Symbolic execution won't affect original GDB state 

* User could restart symbolic execution from breakpoint 


Flow 


e Get debugged program state by calling GDB Python API 
° Get the current program state and yield to triton 

e Set symbolic variable 

e Set the target address 

e Run symbolic execution and get output 


* Inject back to debugged program state 


SymGDB System Structure 


GDB registers 


debugged program 


target address 
SymGDB Triton 
inject back answer 


memory 


Implementation of System Internals 


* Three classes in the symGDB 
* Arch(), GdbUtil(), Symbolic() 


* Arch() 


* Provide different pointer size ` register name 


* GdbUtil() 
* Read write memory ` read write register 
* Get memory mapping of program 
* Get filename and detect architecture 
* Get argument list 


e Symbolic() 
* Set constraint on pc register 
e Run symbolic execution 


Relationship between SymGDB classes 


architecture architecture related values 


GdbUtil context Symbolic 


answer 


Supported Commands 


argv 
memory [address][size] HIS ye 


symbolic 
debug gdb Show debug messages 


Symbolic Execution Process in GDB 


* gdb.execute("info registers", to string-True) to get registers 
e gdb.selected inferior().read memory(address, length) to get memory 


* setConcreteMemoryAreaValue and setConcreteRegisterValue to set 
triton state 


* |n each instruction, use isRegisterSymbolized to check if pc register is 
symbolized or not 


e Set target address as constraint 
* Call getModel to get answer 


* gdb.selected inferior().write memory(address, buf, length) to inject back 
to debugged program state 


Symbolic Environment: symbolic argv 


+ Using "info proc all" to get stack 
start address 
* Examining memory content from program args (pointers) 
stack start address e l 
e argc 
argv[0) 
argv[1] end of args (integer) 


lose environment variables (pointers) 


null end of environment (integer) 


Debug tips 
e Simplify: 


(bvxor (_ bvl 8) (_ 
(_ bvO 8) 


(bvor (bvand (_ bvi 8) (bvnot (_ bv2 8))) (bvand (bvnot (_ bv1 8)) C. bv2 8))) 
(bvxor C. bvi 8) C. bv2 8)) 


(bvor (bvand (bvnot (_ bv2 8)) C. bv1 8)) (bvand (bvnot (_ bv1 8)) C. bv2 8))) 
(bvxor C. bvi 8) C. bv2 8)) 


(bvor (bvand (bvnot (_ bv2 8)) C. bvi 8)) (bvand C. bv2 8) (bvnot C. bv1 8)))) 
(bvxor C bvl 8) C. bv2 8)) 


(bvor (bvand C. bv2 8) (bvnot C. bvi 8))) (bvand (bvnot C. bv2 8) C. bv1 8))) 
(bvxor C. bv2 8) C. bv1 8)) 


Demo 


e Examples 
* crackme hash 
* crackme xor 


e GDB commands 
e Combined with Peda 


crackme hash 


* Source: 
httos://github.com/illera88/Ponce/blob/master/examples 


crackme hash.cpp 
* Program will pass argv[1] to check function 
* |n check function, argv[1] xor with serial(fixed string) 
e |f sum of xored result equals to OXABCD 
* print "Win" 
* else 
* print "fail" 


crackme hash 


#include <stdio.h> 

#include <stdlib.h> 

char *serial = "\x31\x3e\x3d\x26\x31"; 
int check(char *ptr) 


{ int main(int ac, char **av) 
int ji; { 
int hash = ØXABCD; ub ret; 
for (i = 0; ptr[i]; i++) 
hash += ptr[i] ^ serial[i X 5]; if (ac != 2) 
return hash; return -1; 
} = 1 
int main(int ac, char **av) is Shin de 
( if (ret == Øxad6d) 
int ret; printf("win\n"); 
if (ac l= 2) else 
return -1; ` 7 . " 
ret - check(av[1]); printf( fail\n ); 
if (ret == Øxad6d) return Ø; 
printf("win\n"); } 
else 


printf("fail\n"); 


return 0; 


crackme hash 


-text:686484A1 loc 88548561: 


. text: 888485601 
„text : 08048444 
„text :08048447 
„text :908048449 
„text : 686484AA 
„text : 908048 4AF 
„text : 68 6484B2 
. text : 68 6484B5 
. text : 68 6484BC 
` text Josousuer]| 
„text:080484C1 
„text : 98048406 
„text : 68 6484CB 
„text :888484CE 


moy 
add 
moy 
push 
call 
add 
moy 
cmp 
jnz 
sub 
push 
call 
add 


jmp 


eax, [eax+4] 

eax, 4 

eax, [eax] 

eax - 
_25checkPc ; 
esp, 4 
[ebp+uar C], eax 


CODE XREF: main+161j 


char * 
check(char *) 


[ebp*var C], 6AD6Dh 


short loc 80484D8 
esp, BCh 

offset s - 
_puts 

esp, 18h 

short loc_86484E6 


"Win" 


crackme hash 


E 
apple@apple-All-Series:-/gdb-symbolic/examples$ 


crackme xor 


e Source: 


https://github.com/illera88/Ponce/blob/master/examples/crackme xor.c 
Program will pass argv[1] to check function 
In check function, argv[1] xor with 0x55 


If xored result not equals to serial(fixed string) 
* return 1 
* print "fail" 

else 
* goto next loop 

If program go through all the loop 


* return O 
* print "Win" 


crackme xor 


#include <stdio.h> 


. | int main(int ac, char **av) 
#include «stdlib.h» 


char *serial = "\x31\x3e\x3d\x26\x31"; { | 
int ret; 
int check(char *ptr) : 
( if (ac != 2) 
. R return -1; 
int 1 = 0; 
. . ret - check(av[1]); 
while (i < 5){ | 
| . . . if (ret -- 0) 
if (((ptr[i] - 1) ^ Øx55) l= serial[i]) | | 
printf("win\n"); 
return 1; 
. else 
1++; ; z 
} printf("fail\n"); 
return @; 
return @; } 


crackme xor 


-text: 68648418 loc 8648418: 


. text : 68648418 
„text : 08048410 
„text : 98048416 
„text : 68648421 
. text : 08048424 
. text : 88048426 
. text :08048429 
„text :8804842C 
. text : 68 64842F 
. text : 68648432 
. text : 68648434 
. text : 68 64843A 
„text :90804843D 
„text : 68 64843F 
. text : 68 648442 
. text : 08048445 
. text 

„text :08048449 
„text : 98048446 
„text : 68648456 
. text : 68648456 


; CODE XREF: check(char *)+49])j 


-text: 68648456 loc 8648456: 


-text : 68648456 
„text : 68648454 


cmp [ebp+var 4], ^ 

jg short loc 8648456 

mou edx, [ebp+uar 4] 

mou eax, [ebp+arg 8] 

add eax, edx 

mouzx eax, byte ptr [eax] 

mnousx eax, al 

sub eax, 1 

xor eax, 55h 

mou ecx, eax 

mou edx, serial 

mou eax, [ebp+var 4] 

add eax, edx 

mouzx eax, byte ptr [eax] 

mousx eax, al 

cmp ecx, eax 

jz short loc 88585458 

mou eax, 1 

jnp short locret_884845B 
; CODE XREF: check(char *)+3C1j 

add [ebp+var 4], 1 


short loc 8048418 


crackme xor 


E 
apple@apple-All-Series:-/gdb-symbolic/examples$ 


GDB commands 


break main 


#!/bin/bash 


DIR=$(dirname "$(readlink -f "$e")") symbolize argv 
TESTS=(crackme_hash_32 crackme_hash_64 crackme_xor_32 crackme_xor_64) target 0x08948Abe 
for program in "${TESTS[@]}" 
d run aaaaa 

o 

gdb -x $DIR/$program $DIR/../examples/$program triton 
done 


continue 


GDB commands 


E 
apple@apple-All-Series:-/gdb-symbolic/tests$ 


Combined with Peda 


e Same demo video of crackme hash 
* Using find(peda command) to find argv[1] address 


* Using symbolize memory argv[1] address argv[1] length to symbolic 
argv[1] memory 


Combined with Peda 


E 


applegapple-All-Series:-/gdb-symbolic/examples$ 


Conclusion 


* Using GDB as the debugger to provide the information. Save you the 
endeavor to do the essential things. 


e SymGDB plugin is independent from the debugged program except if 
you inject answer back to it. 


* With the tracer support(i.e. GDB), we could have the concolic 
execution. 


Concolic Execution 


* Concolic = Concrete + Symbolic 
* Using both symbolic variables and concrete values 


e It is fast. Compare to Full Emulation, we don't need to evaluate 
memory or register state from SMT formula, directly derived from 
real CPU context. 


Drawbacks of Triton 


e Triton doesn't support GNU c library 
e Why? 


e SMT Semantics Supported: 
httos://triton.quarkslab.com/documentation/doxygen 


SMT Semantics Supported page.html 


* Triton has to implement system call interface to support GNU c 
library a.k.a. support "int Ox80" 


* You have to do state traversal manually. 


Comparison between other symbolic 
execution framework 


° KLEE 
* Angr 


KLEE 


e Symbolic virtual machine built on top of the LLVM compiler 
infrastructure 


Website: http://klee.github.io 


° Github: https://github.com/klee/klee 
e KLEE paper: http://Ilvm.org/pubs/2008-12-OSDI-KLEE.pdf (Worth 


reading) 
* Main goal of KLEE: 


1. Hit every line of executable code in the program 
2. Detect at each dangerous operation 


Introduction 


e KLEE is a symbolic machine to generate test cases. 


* |n order to compiled to LLVM bitcode, source code is needed. 
e Steps: 
* Replace input with KLEE function to make memory region symbolic 
* Compile source code to LLVM bitcode 


* Run KLEE 
* Get the test cases and path's information 


get sign.c 
#include <klee/klee.h> 


int get_sign(int x) { 


if (x == Ø) 
return 0; 
if (x < Ø) 
return -1; 
else 
return 1; 


} 


int main() { 
int a; 
klee make symbolic(&a, sizeof(a), "a"); 
return get sign(a); 


} 


get sign ll 


define i32 @main() #0 { 

%*1 = alloca 132, align 4 

%a = alloca 132, align 4 

store 132 0, i32* %1 

call void @llvm.dbg.declare(metadata !{i32* 
%a}, metadata !25), !dbg !26 

%2 = bitcast i32* Xa to i8*, !dbg !27 

call void @klee make symbolic(i8* %2, 164 4, 
i8* getelementptr inbounds ([2 x i8]* @.str, 
i32 0, 132 0)), !dbg !27 

%3 = load i32* %a, align 4, !dbg !28 

%4 = call i32 geet sign(i32 %3), !dbg !28 

ret i132 %4, !dbg !28 
} 


Result 


lee@561b436ff126:-/klee_src/examples/get_sign$ klee get_sign.bc 
: output directory is "/home/klee/klee_src/examples/get_sign/klee-out-3" 
: Using STP solver backend 


: done: total instructions 
: done: completed paths = 3 
: done: generated tests = 3 
lee@561b436ff126:-/klee_src/examples/get_sign$ ktest-tool ./klee-last/*.ktest 
test file : './klee-last/test000001.ktest' 
: ['get sign.bc'] 
1 


: name: b'a' 
size: 4 
: data: b'\x00\x00\x00\x00' 


', /klee-Last/test000002.ktest' 
['get sign.bc'] 
1 


: name: b'a' 
: size: 4 
: data: b'\x01\x01\x01\x01' 


'./klee-last/test000003.ktest' 
['get sign.bc'] 
1 

: name: b'a' 

: size: 4 

: data: b'\x00\x00\x00\x80' 


Diagram 


#include <klee/klee.h> 
1. Step the program until it | 


meets the branch int get_sign(int x) { 


if (x == Ø) 
return 09; 


if (x « 0) 
return -1; 
else 
return 1; 


} 


int main() { 

int à; 

klee make symbolic(&a, sizeof(a), 
"a"); 

return get_sign(a); 


} 


Diagram 


1. Step the program until it 


meets the branch 


If allgiven operands are 
concrete, return constant 
expression. If not, record 
current condition constraints 
and clone the state. 


=> 


#include <klee/klee.h> 


int get_sign(int x) { 
if (x == Ø) 
return 09; 


if (x « 0) 
return -1; 
else 
return 1; 


} 


int main() { 
int a; 


klee_make_symbolic(&a, sizeof(a), "a"); 


return get sign(a); 


} 


Diagram 


1. Step the program until it 
meets the branch 


2. Ifall given operands are 
concrete, return constant 
expression. If not, record 
current condition constraints 


Constraints: Constraints: 
and clone the state X1=0 X==0 
, ; Next instruction: Next instruction: 
3. Step the states until they hit if (x < 0) tr 


exit call or error 


Diagram 


Step the program until it 
meets the branch 


If allgiven operands are 
concrete, return constant 
expression. If not, record 
current condition constraints 


and clone the state Constraints: Constraints: 
Step the states until they hit er IR 

: Next instruction: Next instruction: 
exit call or error if (x « 0) en, 


Solve the conditional 
constraint 


Diagram 


Step the program until it meets the branch 


If allgiven operands are concrete, return constant expression. If 
not, record current condition constraints and clone the state 


Step the states until they hit exit call or error 
Solve the conditional constraint 


Loop until no remaining states or user-defined timeout is reached 


What's the difference in KLEE 


* Introduce to the concept of state, the deeper path could be reached 
by stepping the state tree. 


e Seems like support GNU c library? 


What's the difference in KLEE 


e Current state is now, our final goal is to 
reach path D. 


e In Triton 


* solve the symbolic variable to path B 
* Set the concrete value and step to path B 
e Solve the symbolic variable to path D 

e In KLEE 


* Record condition constraints to path B 
* Clone the state 
e Solve the symbolic variable to path D 


What's the difference in KLEE 


* When KLEE need to deal with GNU c library, run KLEE with -- 
libc=uclibc --posix-runtime parameters. 


* When KLEE detect the analyzed program make the external call to the 
library, which isn't compiled to LLVM IR instead linked with the 
program together. 


* The library call is only done concretely, which means loosing symbolic 
information within the library call. 


Angr 


* Website: http://angr.io/ 


* Angr is a python framework for analyzing binaries. It combines both 
static and dynamic symbolic ("concolic") analysis, making it applicable 
to a variety of tasks. 


e Support various architectures 


e Flow 
* Loading a binary into the analysis program. 
* Translating a binary into an intermediate representation(IR). 
* Performing the actual analysis 


Flow 


° Import angr 

import angr 

* Load the binary and initialize angr project 

project = angr.Project('./ais3 crackme') 

* Define argv1 as 100 bytes bitvectors 

argv1 = claripy.BVS("argv1",100*8) 

* Initialize the state with argv1 

state = project.factory.entry_state(args=["./crackme1",argv1]) 


Flow 


* Initialize the simulation manager 

simgr = p.factory.simgr(state) 

* Explore the states that matches the condition 
simgr.explore(find= 0x400602) 

e Extract one state from found states 

found = simgr.found[0] 

e Solve the expression with solver 

solution = found.solver.eval(argv1, cast to-str) 


als3 crackme 


e Binary could be found in: 
https://github.com/angr/angr-doc/blob/master/examples 


ais3 crackme/ 


* Run binary with argument 


° |f argument is correct 
* print "Correct! that is the secret key!" 


* else 
e print "I'm sorry, that's the wrong secret key!" 


Target address 


-text:00000000004005EB loc_4B85EB: ; CODE XREF: main+131j 
„text : 66666666664665EB mou Fax, [rbp+uar 16] 

. text : 666666668664 BB5EF add rax, 8 

. text : 000060000064065F3 mou rax, [rax] 

. text : 8880080800805 885F 6 mou rdi, rax 

text :66666666664665F9 call verify 

-text: 888000800805 885FE test eax, eax 

. text :66666666664666 66 jz short loc_46666E 

.text:88888888805 486 62 mou edi, offset aCorrectThatIsT ; “Correct? that is the secret key?" 
-text : 66666666664666 67 call _puts 

-text : 66666666664666 AC jmp short loc_466618 


.„text:8880888808884886 BE ; ~~~ mmm mm mm 
-text:888088088885886 BE 

text : 000060060064066GE loc A^8868E: ; CODE XREF: main+3Btj 

-text : 090006000064066 BE mou edi, offset alMSorryThatSTh ; "I'm sorry, that's the wrong secret key?" 


annannnnannnınnran LE 


Solution 


import angr 

import claripy 

project = angr.Project("./ais3_crackme" ) 
argvi = claripy.BVS("argv1",100*8) 

state = project.factory.entry_state(args=["./ 
crackme1" , argv1]) 

simgr - project.factory.simgr(state) 
simgr.explore(find-z0x400602) 

found = simgr.found[®] 

solution = found.solver.eval(argv1, 
cast_to=str) 

print(repr(solution)) 


Result 


apple ~ angr-doc examples ais3_crackme python solve.py 
'ais3[I tak3 g00d_ nOt3s}\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 


\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00 
x00\x00\x00' 


Intermediate Representation 


* In order to be able to analyze and execute machine code from 
different CPU architectures, Angr performs most of its analysis on an 
intermediate representation 


* Angr's intermediate representation is VEX(Valgrind), since the 
uplifting of binary code into VEX is quite well supported 


Intermediate Representation 


* IR abstracts away several architecture differences when dealing with 
different architectures 


Register names: VEX models the registers as a separate memory space, with 
integer offsets 


Memory access: The IR abstracts difference between architectures access 
memory in different ways 


Memory segmentation: Some architectures support memory segmentation 
through the use of special segment registers 


Instruction side-effects: Most instructions have side-effects 


Intermediate Representation 


e addl %eax, %ebx * t3 = GET:132(0) 
* H get %eax, a 32-bit integer 
e t2 = GET:132(12) 
* # get %ebx, a 32-bit integer 
e t1 = Add32(t3,t2) 
° H addl 
e PUT(O) - t1 
* # put %eax 


active 


deadended 


pruned 


unconstrained 


unsat 


Stash types 


This stash contains the states that will be stepped by default, unless an alternate stash is 
specified. 


A state goes to the deadended stash when it cannot continue the execution for some reason, 
including no more valid instructions, unsat state of all of its successors, or an invalid instruction 
pointer. 


When using LAZY_SOLVES, states are not checked for satisfiability unless absolutely necessary. 
When a state is found to be unsat in the presence of LAZY_SOLVES, the state hierarchy is traversed 
to identify when, in its history, it initially became unsat. All states that are descendants of that point 
(which will also be unsat, since a state cannot become un-unsat) are pruned and put in this stash. 


If the save unconstrained option is provided to the SimulationManager constructor, states that are 
determined to be unconstrained (i.e., with the instruction pointer controlled by user data or some 
other source of symbolic data) are placed here. 


If the save unsat option is provided to the SimulationManager constructor, states that are 
determined to be unsatisfiable (i.e., they have constraints that are contradictory, like the input 
having to be both "AAAA" and "BBBB" at the same time) are placed here. 


What's difference in Angr 


e State concept is more complete, categorized, and more operation we 
can do upon the state. 


e Symbolic function 


Symbolic Function 


* Project tries to replace external calls to library functions by using 
symbolic summaries termed SimProcedures 


* Because SimProcedures are library hooks written in Python, it has 
inaccuracy 


e If you encounter path explosion or inaccuracy, you can do: 
1. Disable the SimProcedure 


2. Replace the SimProcedure with something written directly to the situation 
in question 


3. Fixthe SimProcedure 


Symbolic Function(scanf) 


* Source code: 
httos://github.com/angr/angr/blob/master/angr/procedures/libc 


scanf.py 
* Get first argument(pointer to format string) 


1. Define function return type by the architecture 
2. Parse format string 


3. According format string, read input from file descriptor O(i.e., 
standard input) 


4. Dothe read operation 


Symbolic Function(scanf) 


class SimProcedure(object): 
@staticmethod 
def ty_ptr(self, ty): 
return SimTypePointer(self.arch, ty) 


class FormatParser (SimProcedure): 

def parse(self, fmt idx): 

fmt idx: The index of the (pointer to the) format string in the arguments 
list. 


def interpret(self, addr, startpos, args, region=None): 
Interpret a format string, reading the data at addr in region into args 
starting at startpos . 


Symbolic Function(scanf) 


from angr.procedures.stubs.format parser import FormatParser 

from angr.sim type import SimTypeInt, SimTypeString 

class scanf(FormatParser): 

def run(self, fmt): 

self.argument types - (0: self.ty ptr(SimTypeString())) 
self.return type - SimTypeInt(self.state.arch.bits, True) 
fmt str = self. parse(0) 
f = self.state.posix.get_file(®) 
region = f.content 
start = f.pos 
(end, items) = fmt_str.interpret(start, 1, self.arg, region=region) 
# do the read, correcting the internal file position and logging the action 
self.state.posix.read from(@, end - start) 
return items 


def parse(self, fmt idx): 


int scanf ( const char * format, ... ); int sscanf ( const char * s, const char 
scanf ("%d",&i); * format, ...); 


sscanf (sentence,"%s %*s %d",str,&i); 
fmt str = self. parse(0) "ut stb = snis. pareti 


def parse(self, fmt idx): 


int scanf ( const char * format, ... ); 
scanf ("%d",&i); 


f = self.state.posix.get_file(®) 

region = f.content 

start = f.pos 

(end, items) = fmt_str.interpret(start, 
1, self.arg, region=region) 


int sscanf ( const char * s, const char 
* format, ...); 
sscanf (sentence, "%s %*s %d",str,&i); 


_, 1tems = 
fmt str.interpret(self.arg(0), 2, 
self.arg, region-self.state.memory) 
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